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Abstract 

Considering  two  classes  of  vehicles,  we  aim  to  identify  the  physical  elements  of  the 
vehicles  with  the  most  impact  on  identifying  the  class  of  the  vehicle  in  synthetic  aperture 
radar  (SAR)  images.  We  classify  vehicles  using  features,  from  polarimetric  SAR  images, 
corresponding  to  the  structure  of  physical  elements.  We  demonstrate  a  method  which 
determines  the  most  impactful  features  to  classification  by  applying  subset  selection  on 
the  features.  Determination  of  the  most  impactful  elements  of  the  vehicles  is  beneficial  to 
the  development  of  low  observables,  target  models,  and  automatic  target  recognition  (ATR) 
algorithms. 

We  show  how  previous  work  with  features  from  individual  pixels  is  applied  to  a  greater 
number  of  target  states.  At  a  greater  number  of  target  states,  the  previous  work  has  poor 
classification  performance.  Additionally,  the  nature  of  the  features  from  pixels  limits  the 
identification  of  the  most  impactful  elements  of  vehicles.  We  apply  concepts  from  optical 
sensing  to  reduce  the  limitation  on  identification  of  physical  elements. 

We  draw  from  optical  sensing  feature  extraction  with  the  use  of  Histogram  of 
Oriented  Gradients  (HOG).  From  the  cells  of  HOG,  we  form  features  from  frequency 
and  polarization  attributes  of  SAR  images.  Using  a  subset  set  of  features,  we  achieve  a 
classification  performance  of  96.10  percent  correct  classification.  Using  the  features  from 
HOG  and  the  cells,  we  identify  the  features  with  the  most  impact. 

Using  backward  selection,  a  process  for  subset  selection,  we  identify  the  features  with 
the  most  impact  to  classification.  The  execution  of  backward  selection  removes  the  features 
which  induce  the  most  error  in  classification.  We  report  features  extracted  from  polarization 
attributes  of  SAR  images  have  the  most  positive  impact  on  classification  performance. 
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ANALYSIS  OF  FEATURES  FOR  SYNTHETIC  APERTURE  RADAR  TARGET 


CLASSIFICATION 

I.  Introduction 

Considering  two  classes  of  vehicles  in  Figure  1.1,  we  aim  to  identify  the  physical 
elements  of  the  vehicles  with  the  most  impact  on  identifying  the  class  from  synthetic 
aperture  radar  (SAR)  images.  We  classify  vehicles  using  features,  extracted  from 
polarimetric  SAR  images,  corresponding  to  the  structure  of  physical  elements.  We 
demonstrate  a  method  which  determines  the  most  impactful  features  to  classification  by 
applying  a  subset  selection  on  the  features.  Determination  of  the  most  impactful  elements 
of  the  vehicles  is  beneficial  to  the  development  of  low  observables,  target  models,  and 
automatic  target  recognition  (ATR)  algorithms. 


Camry  HondaCivic4dr  Maxima  Mitsubishi  Sentra 


ToyotaAvalon  Jeep93  Jeep99  MazdaMPV  ToyotaTaeoma 


Figure  1.1:  CV  Domes  Vehicles  [1]. 


Various  processes  are  used  to  identify  targets  within  SAR  images.  Some  processes 
chip  the  SAR  images  and  then  correlate  the  chips  with  a  dictionary  of  chips  to  identify  the 
class  [2].  Other  processes  use  Principle  Component  Analysis  to  define  the  SAR  images  and 
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a  nearest  neighbor  classifier  to  identify  the  class  [3,  4].  The  diameter,  inertia,  percent  bright 
constant  false  alarm  rate  (CFAR),  and  fractal  dimension  of  the  target  in  the  image  have  been 
used  and  compared  to  training  data  to  determine  the  class  [5].  Another  process  fingerprints 
SAR  images,  utilizing  machine  learning  to  identify  the  class  [6].  The  previously  stated 
research  all  assess  the  information  from  the  entire  SAR  image  to  identify  a  target  class. 
Work  performed  by  Flynn  [7]  used  information  from  a  single  pixel  in  the  SAR  image  and 
machine  learning  to  identify  the  class. 

This  thesis  explores  the  pixel  based  features  used  by  Flynn  [7]  and  full  image 
classification.  Flynn’s  pixel  based  features  tie  information  back  to  physical  elements  of 
the  vehicles,  but  does  not  classify  using  the  entire  image.  Full  image  classification  utilizes 
the  entire  SAR  image,  but  is  limited  in  tying  the  features  used  to  physical  elements  of  the 
vehicle.  We  extend  the  work  performed  by  Flynn  [7]  to  extract  features  from  the  entire 
image.  We  also  explore  the  impact  of  features  on  classification  of  the  vehicle  through  the 
use  of  backward  selection. 

SAR  ATR  is  a  challenging  problem.  Inherent  to  the  nature  of  SAR,  the  images  suffer 
from  low  resolution  compared  to  other  imaging  systems  [8].  Also,  the  targets  contain 
multiple  states  encompassing  360  degrees  in  azimuth  and  a  change  in  elevation  dependent 
upon  the  concept  of  operations.  As  such,  classification  of  SAR  images  requires  a  high¬ 
dimensional  feature  space,  which  is  computationally  intensive.  To  evaluate  classification 
performance,  the  Air  Force  Research  Laboratory  (AFRL)  high  performance  computing 
(HPC)  resources  are  used  in  the  execution  of  this  research  [9]. 

Through  subset  selection,  we  show  that  classification  performance  is  improved  using 
features  extracted  from  cells  of  an  image.  Subset  selection  also  identifies  features  with  the 
most  impact  on  classification.  Analysis  of  these  features  may  expose  the  impact  of  physical 
elements  of  the  vehicles  on  SAR  ATR. 
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1.1  Thesis  Organization 

This  thesis  is  organized  into  five  chapters.  Chapter  II  covers  the  tools,  techniques, 
and  algorithms  used  to  support  the  analysis  of  the  impact  of  features.  Previous  research 
by  Flynn  [7]  is  also  reviewed  in  Chapter  II.  Chapter  III  covers  the  application  of  pixel  and 
cell  methods  for  feature  extraction  from  SAR  images.  Flynn’s  research  [7]  is  extended 
in  Chapter  III.  Chapter  IV  covers  the  methodology  behind  backward  selection  and  the 
evaluation  of  the  impact  of  features.  Chapter  V  completes  this  thesis  with  final  conclusions 
and  recommendations  for  future  work. 
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II.  Background 


The  goal  of  this  chapter  is  to  build  and  define  the  foundation  of  knowledge  used  in 
the  development  and  execution  of  this  research.  Section  2.1  introduces  the  ATR  process 
and  focuses  this  thesis  on  feature  formation.  Section  2.2  introduces  the  nature  of  features, 
attributes,  and  feature  vectors.  Section  2.3  introduces  the  spectrum  parted  linked  image 
test  (SPLIT)  algorithm  for  attribute  extraction  from  SAR  images.  Section  2.4  introduces 
the  classifiers  we  use  in  this  thesis.  Section  2.5  reviews  how  previous  work  by  Flynn  [7] 
formed  features  and  used  classifiers  with  SAR  images. 

2.1  ATR  Overview 

The  ATR  process  is  characterized  by  the  process  shown  in  Figure  2.1.  Data  is 
processed  into  a  form  where  targets  are  detected.  The  detected  targets  are  segmented 
from  the  data  and  features  are  extracted.  Using  the  features,  the  target  is  classified.  The 
classification  of  the  target  is  used  to  impact  system  or  mission  parameters  in  real  time  [10]. 
In  this  thesis,  we  focus  on  the  fourth  stage  of  the  ATR  process  (i.e.  feature  computation, 
selection,  and  classification). 


Figure  2.1:  Block  Diagram  of  a  Typical  ATR  System  [10]. 


2.2  Feature  Computation,  Selection,  and  Classification 

Features,  formed  from  the  attributes  of  SAR  images,  are  applied  in  the  ATR  process  to 
classify  targets.  Chapter  III  evaluates  two  ways  to  form  features  from  the  attributes  in  SAR 
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images.  The  full  set  of  features  used  to  classify  targets  is  a  feature  vector.  The  relationship 
between  attributes,  features,  and  feature  vectors  is  illustrated  in  Figure  2.2. 


Attribute  1 
Attribute  2 
Attribute3 
Attribute4  _ 

Attributes 
Attribute6 
Attribute?  _ 

Attributes  ~ 
Attribute  9 
Attribute  10 


Feature  Vector 


Attribute  i:Tj— 

Figure  2.2:  Relation  of  Attributes,  Features,  and  Feature  Vectors. 


Figure  2.3  shows  the  generic  process  we  use  in  this  thesis  to  evaluate  the  classification 
performance  of  feature  vectors.  The  first  step  extracts  attributes  from  the  data.  The  second 
step  formulates  the  extracted  attributes  into  features.  Third,  a  feature  vector  is  populated 
from  a  subset  of  all  possible  features.  The  fourth  step  is  to  train  and  test  on  the  feature 
vector  using  a  classifier  to  arrive  at  a  metric  of  performance  for  the  feature  vector. 

In  Chapter  III,  we  use  SPLIT  to  extract  the  odd  bounce  polarization  response  attribute, 
k0,  from  SAR  images.  Using  the  pixel  method,  we  form  a  feature  directly  from  the  extracted 
ka  pixel  value.  Using  the  cell  method,  we  form  a  feature  from  the  mean  of  the  extracted 
ka  pixel  values  over  a  spatial  region  of  pixels  called  a  cell.  Other  features  are  formed  from 
different  attributes.  A  combination  of  features  forms  a  feature  vector.  The  percent  correct 
classification  of  targets  is  used  as  the  metric  of  classification  performance  corresponding  to 
the  feature  vector. 
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Classification 

Performance 


Of  Feature  Vector 

Figure  2.3:  Block  Diagram  of  Classification  Process. 


2.3  Attribute  Extraction 

In  this  thesis,  we  use  the  SPLIT  algorithm  developed  by  Fuller  and  Saville  [11]  for 
the  extraction  of  attributes.  SPLIT  constructs  a  set  of  three  2-D  sub-images  for  each 
polarization  channel  of  the  target  in  the  x-y  plane  using  a  form  of  back-projection  [11]. 
The  sub-images  are  related  to  the  frequency  spectrum  of  the  phase  history.  The  phase 
history  is  filtered  into  overlapping  frequency  banks.  The  first  frequency  bank  is  the  first 
half  of  the  bandwidth,  the  second  is  the  middle  two-fourths  of  the  bandwidth,  and  the  third 
is  the  second  half  of  the  bandwidth. 

Peaks  that  are  a  result  of  canonical  scatterers  are  stable  in  location  across  sub-images 
of  the  back  projection  [12].  Attributes  are  extracted  for  the  stable  peak  pixels  in  the  sub- 
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images.  As  such,  identification  of  the  stable  peaks  in  a  target  area  is  the  first  step  in  attribute 
extraction.  SPLIT  uses  a  watershed  technique  on  the  sub-images  to  identify  peaks  within 
each  sub-image.  The  threshold  of  the  watershed  technique  is  variable  within  SPLIT.  A  low 
threshold  rejects  peaks  with  an  amplitude  more  than  ldB  below  the  maximum  peak  in  the 
image.  A  medium  threshold  rejects  peaks  with  an  amplitude  more  than  10  dB  below  the 
maximum  peak  in  the  image.  A  high  threshold  rejects  peaks  with  an  amplitude  more  than 
32  dB  below  the  maximum  peak.  The  frequency  response  attribute  is  extracted  from  the 
peak  pixels.  We  use  the  high  threshold  in  this  thesis.  Work  with  ATR  and  SPLIT  reports 
classification  performance  is  best  using  a  high  threshold  [13]. 

SPLIT  extracts  a  from  each  co-polarization  channel,  and  [ka,  k,,\  from  each  of  the  sub¬ 
images  [11].  The  final  a  for  the  pixel  is  the  weighted  average  of  the  attribute  across  the 
co-polarization  channels.  The  weight  of  each  a  is  the  magnitude  of  the  pixel  amplitude 
related  to  the  a.  The  final  [ke,  ka]  for  the  pixel  is  the  weighted  average  of  the  attributes 
across  the  sub-images.  Each  attribute  is  described  in  detail  in  Subsections  2.3. 1-2. 3. 3. 

2.3.1  Amplitude  Attribute. 

The  amplitude  attributes  of  the  pixels  relate  to  the  scatters  of  a  target  and  are  displayed 
as  SAR  images.  SPLIT  forms  images  from  the  horizontal  polarization,  PHh ,  vertical 
polarization,  Pvv ,  and  cross  polariztion,  PHV,  channels  using  a  form  of  backprojection  [11]. 
The  combination  of  the  three  images,  [ PHh ,  Pvv,  Phv ],  forms  the  final  image,  7,  where  the 
pixel  amplitudes  are  the  extracted  amplitude  attributes  for  the  image.  The  three  images  are 
combined  as  [11] 

I  -  \P hh\~  +  |Pvv|2  +  \PhvV-  (2.1) 

2.3.2  Frequency  Response  Attribute. 

The  frequency  response  attribute  is  extracted  from  the  change  in  pixel  amplitude 
across  the  sub-images.  The  frequency  response  ties  back  to  the  curvature  of  the  physical 
element  of  the  target.  Physical  elements  that  are  doubly  curved,  such  as  a  sphere,  have  an 
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approximate  a  value  of  zero  [11,  14].  Physical  elements  that  are  singly  curved,  such  as 
a  cylinder,  have  an  approximate  a  value  of  one  or  negative  one  [11,  14].  Finally,  target 
elements  that  are  a  comer  reflector,  such  as  a  trihedral,  have  an  approximate  a  value  of 
two  or  negative  two  [11,  14].  The  isotropic  model  for  the  frequency  response  is  shown  as 
[11,15,  16] 

Sf(f,A,a)  =A(jf)%,  (2.2) 

where  /  is  the  frequency  of  the  waveform,  and  A  is  a  complex  value  related  to  the  radar 
cross  section  of  the  point.  The  relationship  between  curvature  and  the  a  value  is  illustrated 
in  Figure  2.4.  SPLIT  uses  an  iterative  curve  fitting  algorithm  to  estimate  a.  The  amplitude 
of  the  pixels  of  the  three  sub-images  defines  the  curve  a  is  estimated  to  fit.  An  iterative 
curve  fitting  method  is  used  to  minimize  the  residual  between  the  estimated  amplitude 
curve  and  measured  amplitudes  of  the  sub-images  [11]. 


The  a  attribute  for  a  pixel  is  fit  to  minimize  the  norm  of  the  residual  for  the  kth  iteration 
expressed  as  £  -  f(ak)  [11].  The  normalization  frequency  vector,  f(or^),  is  expressed  as 


[11] 


[(urk+\{fc2)ak+\{urk+2Y 

(fc)ak+2 


(2.3) 


where  fc\  is  the  center  frequency  of  subimage  1,  fc 2  is  the  center  frequency  of  subimage 
2,  /c3  is  the  center  frequency  of  subimage  3,  and  fc  is  the  center  frequency  of  the  full 
bandwidth.  The  observation  vector,  er,  is  expressed  as  [11] 


cr  =  [|aI|2,|a2|2,|a3|2]7’, 


(2.4) 


where  ci\  is  the  amplitude  of  the  pixel  in  subimage  1 ,  a2  is  the  amplitude  of  the  pixel  in 
subimage  2,  and  a3  is  the  amplitude  of  the  pixel  in  subimage  3.  The  normalization  factor, 
fy,  is  expressed  as  [1 1] 


h  = 


T 

a  (r 


o-Tf{dk) 


(2.5) 
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Signal  Frequency  Parameter 

\Suv\2(X-fa  a  =  - 1,-2  a  =  1 

u,w  denote  polarization 


specular  reflection 
G  =  [a,  1,0,0] 
a  e  {0,1,2} 


* 

plate 


retro-reflection 
G  =  [a,  0,1,0] 
a  G  {1,2} 


G  =  [a,  1,0,0] 

a  e  {2} 


trihedral 


diffraction 

G  =  [a  <  0,Ko,Ke,Kh  ] 


Note:  G  =  [a,  0,0,1]  not  represented  by 
depicted  shapes. 


Figure  2.4:  Relationship  Between  Curvature  and  Frequency  Response  (used  with 
permission:  Dr.  Julie  Jackson)  [17]. 


The  first  iteration  curve  fitting  method  is  initialized  at  [1 1] 


a\  = 


'°gg 

los| 


2. 


(2.6) 


The  frequency  parameter,  a is  adjusted  by  a  scaled  version  of  norm  of  the  residual 
expressed  as  [11] 


4 


(0.95/ 


—  -  /(4) 

Vk 


2 


(2.7) 


where  at+ 1  is  expressed  as 


4+i  - 


4  +  4, 


/  -  /(4  +  4) 


< 


g- /(<**- 4) 


4  -  4,  otherwise. 
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SPLIT  applies  a  threshold  to  the  initial  frequency  response  parameter,  a\,  where  if  dq  i 
[-6, 6],  then  the  estimation  is  considered  to  be  not  of  a  scattering  center  and  the  estimated 
value  is  discarded  [11].  SPLIT  applies  a  threshold  to  6k,  where  if  6k  <  0.001,  then  a.k+ 1  has 
converged  on  the  prescribed  amount  of  precision  and  is  the  finalized  estimate  of  frequency 
response,  ax  [11]-  SPLIT  then  applies  a  threshold  to  the  finalized  estimate  of  «w,  where 
if  aK  i  [-4,4],  then  the  estimation  is  considered  to  be  not  of  a  scattering  center  and  the 
estimate  value  is  discarded  [11]. 

2.3.3  Polarization  Response  Attributes. 

The  polarization  response  attributes  are  extracted  by  SPLIT  from  the  relationship 
between  the  amplitude  of  the  three  polarization  channels.  The  characteristics  of  the 
physical  elements  of  the  target  affect  the  polarization  of  the  field  that  is  re-radiated  back 
to  the  radar  [18].  Specifically,  the  presence  and  type  of  a  comer  reflector  affects  the 
polarization  of  the  re-radiated  field.  When  a  linearly  polarized  electric  field  is  incident 
on  a  flat  perfect  electric  conductor  (PEC),  the  reflected  field  maintains  the  polarization 
characteristics  of  the  incident  field  [18].  When  a  linearly  polarized  electric  field  is  incident 
upon  a  dihedral  comer  reflector,  the  component  perpendicular  to  the  reflector  becomes 
inverted  [11].  Additionally,  for  a  linearly  polarized  electric  field  incident  on  a  trihedral 
comer  reflector,  the  reflected  field  maintains  the  polarization  characteristics  of  the  incident 
field.  The  nature  of  the  polarization  response  is  documented  in  Figure  2.5  [11,  18]. 

Given  fully  polarimetric  SAR  data,  the  polarization  response  may  be  extracted. 
Fully  polarimetric  SAR  data  is  only  attainable  with  a  system  pre-configured  to  transmit, 
receive,  and  process  radar  waveforms  with  both  vertical  and  horizontal  polarization 
simultaneously.  Radar  returns  consisting  of  only  one  polarization  cannot  be  processed 
to  extract  a  polarization  response. 
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Figure  2.5:  Reflection  Behavior  for  Linearly  Polarized  Electric  Fields  (used  with 
permission:  Dane  Fuller)  [11,  18]. 


The  relationship  between  the  incident  field  and  the  scattered  or  received  field  contains 
the  polarization  response  information.  The  relationship  is  defined  as  [19] 


=SEle~jkr  =  — L 


S  HH  S  HV  E'h 


V4nr 


^Anr 


vh  Jyy 


where  E1  is  the  incident  electric  field,  Es  is  the  scattered  or  received  electric  field,  r  is  the 
distance  between  the  receive  antenna  and  the  scatterer,  and  5  is  the  Sinclair  Matrix.  The 
Sinclair  Matrix  is  composed  of  horizontal  and  vertical,  transmit  and  receive  components. 
It  is  important  to  note  the  S  Hv  is  equivalent  to  SVh  in  the  case  of  a  monostatic  radar 
[7,  15,  20].  The  equivalence  of  S Hv  and  S  V/i  does  not  hold  true  in  the  case  of  bistatic 
radar. 
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The  attributes  are  decomposed  using  Krogager  decomposition  referencing  circular 
polarization.  The  Sinclair  Matrix  is  translated  to  circular  polarization,  where  R  is  right- 
hand  polarized  and  L  is  left-hand  polarized,  by  [21] 


S  RR  -  jS  HV  +  2  (^  HH  -  S  yy). 

(2.9) 

1 

S  LL  -  jS  HV  -  2  w  HH  -  S  vv)i 

(2.10) 

s  RL  =  2^  HH  +  S  yy). 

(2.11) 

From  the  circular  polarization,  a  measure  of  the  odd,  even,  and  helical  scattering 
mechanisms  are  extracted  by  the  Krogager  decomposition  given  by  [21] 


ke  =  mindS1  LL\ ,  |S RR\),  (2.12) 

*0  =  |S/d,  (2-13) 

kh  =  abs(\SRR\-\SLL\),  (2.14) 


where  ke  is  a  coefficient  of  the  even  bounce  mechanism,  k„  is  a  coefficient  of  the  odd  bounce 
mechanism,  and  kh  is  a  coefficient  of  the  helical  bounce  mechanism.  The  finalized  attributes 
are  the  normalized  coefficient  of  the  bounce  mechanisms  defined  by 


Viy2  +  fci2  +  fei2’ 

ke 

y/\k0\2  +  \ke\2  +  \kh\2' 


(2.15) 

(2.16) 


SPLIT  does  not  extract  the  helical  bounce  mechanism  because  the  helical  mechanism  can 
be  defined  as  the  relationship  between  ke  and  ka  as  (11] 


kk  —  1  kQ  ke . 


(2.17) 


We  use  the  ke  and  ka  attributes  extracted  for  every  pixel  in  Chapters  III  and  IV.  Once  we  have 
the  extracted  amplitude,  frequency,  and  polarization  attributes  from  SPLIT,  we  form  them 
into  features  and  evaluate  the  classification  performance  using  machine  learning  classifiers. 
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2.4  Classifiers 


We  used  both  a  linear  and  a  non-linear  machine  learning  classifier  to  evaluate  the 
classification  performance  of  feature  vectors.  The  linear  machine  learning  classifier  we  use 
is  linear  discriminant  analysis  (LDA).  The  non-linear  machine  learning  classifier  we  use  is 
relevance  vector  machine  (RVM).  Each  classifier  is  described  in  detail  in  Subsection  2.4.1 
and  2.4.2. 

2.4.1  Linear  Discriminant  Analysis. 

Linear  discriminant  analysis  (LDA)  is  a  supervised  machine  learning  method  for 
dimensionality  reduction,  classification,  and  learning  [22].  The  process  projects  a  P- 
dimensional  feature  vector  into  a  one -dimensional  space.  The  process  statistically 
minimizes  the  variance  of  class  data  in  the  one-dimensional  space,  while  maximizing 
separation  between  classes.  LDA  is  only  applicable  when  P  is  greater  than  or  equal  to 
2. 

LDA  develops  a  projection,  w,  such  that  [22] 

z  =  Wrx,  (2.18) 

where  x  is  the  feature  vector,  and  z  is  a  point  in  one-dimensional  space.  The  projection 
matrix,  w,  is  defined  such  that  the  classes  are  projected  to  maximize  the  separation  between 
classes  and  minimize  the  scatter  within  a  class  [22].  Given  a  two  class  comparison  with  P 
total  features  and  N  instances  of  each  class,  there  exists  the  mean  of  Xj  n  e  SJ1P,  mi  of  class 
1  and  a  mean  of  x2n  e  SJ?P,  m2  of  class  2.  There  also  exists  a  projection  of  mi  and  m2, 
and  m2,  such  that  [22] 

ni\  =  w7m|,  (2.19) 

m2  =  w7  m2.  (2.20) 
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The  scatter  within  a  class,  .vy  for  class  1  and  s\  for  class  2,  is  characterized  as 


X(wrX!.„  -  mi)2, 

Tl—  1 

(2.21) 

X(wrx,„  -  mi)2. 

n=  1 

(2.22) 

The  objective  is  to  maximize  \m\  -  m2|  and  minimize  (.vy  +  s\)  [22].  The  w  maximizing 

(mi  -  nil)2 


J(  w)  = 


s\  +  s\ 


(2.23) 


is  the  Fisher’s  linear  discriminant  [22].  From  the  numerator,  we  derive  the  between-class 
scatter  matrix,  SB,  through  [22] 


(mi  -  m2)2  =  (wrmi  -  wrm2)2, 

=  wr(mi  -  m2)(m1  -  m2)rw, 

=  wrSBw.  (2.24) 


The  within-class  scatter  matrix,  Sc  is  extracted  by  rewriting  the  variance  of  a  class  after 
projection  as 

s2c  =  ^  wr(xc,„  -  mc)(xc.„  -  mc)rw, 

n 

=  wrSfw,  (2.25) 

where  subscript  c  6  [1,2]  is  the  class  designator  and  Sc  =  2„(xC;„  -  mc)(xC;„  -  mc.)7  . 
Substituting  s\  +  52  in  the  denominator  of  Equation  (2.23)  with 

sj  +  s\  =  wrSiW  +  wrS2w, 

=  wrSn/W,  (2.26) 


where  Sw  =  Si  +  S2,  and  the  numerator  with  Equation  (2.24),  Equation  (2.23)  is  rewritten 
as 


J{  w)  = 


wrSBw 

wrSn/w’ 


(2.27) 
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To  evaluate  where  7(w)  is  maximized,  the  gradient  with  respect  to  w  is  taken  and  set  equal 
to  zero.  The  result  is  [22] 


wr(m!  -  m2) 
wTSw’w 


(2(111!  -  m2)  - 


wr(m!  -  m2) 
wTSww 


=  0. 


Solving  Equation  (2.28)  for  w, 


w  =  S^mj  -  m2). 


(2.28) 


(2.29) 


LDA  is  optimal  when  the  classes  are  normally  distributed  [22].  In  such  a  case,  the 
distribution  of  class  c  is  N(mc,Sw),  where  is  the  same  as  in  Equation  (2.29). 
Additionally,  if  mi  ~  m2,  then  w  approaches  zero  as  mi  goes  to  m2.  In  such  a  case, 
the  classes  are  inseparable  with  the  features  used. 

We  use  the  LDA  classifier  later  in  Chapter  III.  We  do  not  expect  the  LDA  to  perform 
well  with  the  high  dimensionality  of  the  target  states.  If  mi  «  m2,  then  LDA  is  unable  to 
separate  the  classes.  Instead,  we  use  the  non-linear  classifier,  RVM,  to  classify  in  the  high 
dimensional  space. 

2.4.2  Relevance  Vector  Machine. 

Relevance  vector  machine  (RVM)  is  a  supervised  machine  learning  process  using 
a  Bayesian  framework  and  kernel  functions  to  obtain  sparse  solutions  to  non-linear 
classification  tasks  [23].  RVM  uses  a  Bayesian  framework  applied  to  the  structure  of 
another  sparse  linearly-parameterized  model,  the  support  vector  machine  (SVM).  Similar 
to  LDA,  SVM  attempts  to  maximize  the  spread  between  classes  and  minimize  the  error  or 
variance  within  a  class  [24,  25]. 

SVM  classification  decisions  are  based  on  [24-26] 

>’;  =  sgn(vrx;-  +  b ),  (2.30) 


where  v  is  a  vector  of  weights  defining  the  hyperplane  with  a  crossing  at  b,  x  is  a  feature 
vector,  and  y,  is  the  class  identifier  y/  €  {-1, 1}.  The  hyperplane  separates  the  y,  =  1 
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and  >’,•  =  -1  classes  given  linearly  separable  data.  The  SVM  optimizes  the  hyperplane  v, 
defined  by  [25], 

/ 

\  =  ^jyiaixi,  (2.31) 

i=  1 

where  /  is  the  total  number  of  training  vectors.  The  a,  are  Lagrange  multipliers  and  the 
primal  Lagrangian  is  [25,  27], 

L(a )  =  ^  a,-  -  -  ^  y/V/Ovn/x,-  •  x;>,  (2.32) 

1=1  1,7=  I 

where  ()  is  the  kernel  operator  we  will  discuss  later  in  this  section.  The  kernel  in  Equation 
(2.32)  is  the  dot  product  of  jc,  and  Xj.  The  vector  a*  that  maximizes  the  primal  Lagrangian 

in  Equation  (2.32)  while  also  holding  to  [25] 

/ 

^W  =  0,  (2.33) 

i=  1 

and 


at  >  0 ,i=  1 


(2.34) 


optimizes  the  hyperplane  v  in  Equation  2.31.  The  optimized  hyperplane,  v*  is  defined  as 

/ 

Y*  =  J]yia*xi.  (2.35) 

i=  1 

The  value  of  b*  is  defined  where  v*  optimally  separates  the  two  classes  by  [25] 

„  maxVi=_,(v*  •  Xi)  +  miny.=1  (v*  •  x,-> 
b  = - - - .  (2.36) 

The  Karush-Kuhn-Tucker  complementary  conditions  apply  such  that  [25,  27] 


a*\yi((r-xi)  +  b*)-l]=0, 


(2.37) 


implying  that  only  inputs  x,  closest  to  the  hyperplane  have  a  corresponding  non-zero  a* 
[25].  The  x,  with  non- zero  a*  are  called  support  vectors  [23-25].  Figure  2.6  illustrates  the 
concept  in  a  two-dimensional  linearly  separable  feature  space.  The  circled  points  are  the 
support  vectors. 
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Figure  2.6:  Hyperplane  Through  Two  Linearly  Separable  Classes  [24]. 


Using  the  support  vectors,  the  optimal  hyperplane  can  be  expressed  without  explicitly 
defining  the  hyperplane  v*  as  [25] 

f(x,a*,b*,xesv)  =  ^  yia* <x(-  •  x)  +b*,  (2.38) 

iesv 

where  sv  are  the  indices  of  the  support  vectors.  From  Equation  (2.38),  (x;  •  x)  is  defined  as 
the  linear  kernel. 

For  classification  of  non-linearly  separable  data,  the  linear  kernel  does  not  provide 
separation  between  the  classes  in  a  linear  feature  space.  A  non-linear  kernel  is  used  to 
implicitly  map  x,  into  a  non-linear  feature  space.  The  non-linearly  separable  classes  shown 
in  Figure  2.7  are  defined  by  two  parameters,  one  on  each  axis  in  the  left  image.  Notice  that  a 
hyperplane  cannot  be  defined  to  separate  the  classes.  However,  the  classes  are  separable  in 
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a  third-dimension,  as  shown  on  the  right  side  of  Figure  2.7.  The  third  dimension  is  similar 
to  a  non-linear  feature  space  created  by  a  non-linear  kernel.  SVM  uses  kernel  mapping  to 
evaluate  in  the  third  dimension.  In  this  thesis  we  use  non-linear  kernel  called  the  radial 


Figure  2.7:  Data  Re-Mapping  Using  the  Radial  Basis  Function. 


basis  function,  defined  as  [24] 


K(x,-,  xj)  =  e 


(2.39) 


where  the  subscripts  on  x  are  individual  instances  of  the  training  feature  vectors.  The  form 
of  the  primal  Lagrangian  to  optimize  the  hyperplane  using  the  non-linear  radial  basis  kernel 
is  [24] 

i  j  i 

L(a)  =  ^  or,-  -  -  yiyjaiajKiXi,  Xj).  (2.40) 

i=  i  2  Uj=  i 

Solving  for  a*  as  in  [25],  b*  is  defined  by  [24] 

=  (2.41) 

s  iesv  jesv 

The  hyperplane  is  then  defined  similar  to  Equation  (2.38)  as  [24,  25,  27] 

f(x,  a\b *)  =  2  ymKixu  x)  +  b*.  (2.42) 

iesv 
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Using  the  same  process  as  with  the  radial  basis  function,  other  non-linear  kernels  may  be 
implemented. 

SVM  develops  a  hyperplane  for  use  as  a  classification  boundary,  given  sufficient 
training  vectors.  SVM  distills  the  training  vectors  down  to  support  vectors  used  to  define 
the  hyperplane.  The  hyperplane  is  defined  using  the  kernel  function  in  conjunction  with 
support  vectors.  There  exists  a  linear  correlation  between  the  number  of  training  vectors 
and  support  vectors  [23].  As  the  number  of  support  vectors  grows,  so  does  the  number  of 
basis  functions  of  the  kernel  function.  For  large  training  sets,  the  increase  in  support  vectors 
becomes  computationally  prohibitive.  Additionally,  the  increase  in  the  support  vectors 
reduces  the  smoothness  of  the  boundary  between  classes  leading  to  over-classification.  In 
response  to  the  faults  of  SVM,  RVM  was  developed  by  Tipping  [23]. 

RVM  is  a  Bayesian  approach  to  SVM  [23,  28].  The  Bayesian  approach  further 
increases  the  sparseness  already  present  in  the  SVM  support  vectors  by  inclining  the  a, 
value  to  zero  [23,  28].  The  new  set  of  vectors,  called  relevance  vectors,  are  sparsely 
determined  with  the  posterior  distribution  of  the  training  vectors  and  a  limiting  prior 
distribution  on  the  weight  a,  [23]. 

RVM  manipulates  the  optimal  hyperplane  in  Equation  (2.42),  removing  y,  and  b*  to 
be  [23,  28] 

f(x,a*)  =  Yja*K(xhx),  (2.43) 

ierv 

where  rv  are  the  indices  of  relevance  vectors.  Assuming  a  Gaussian  distribution  of  y;,  the 
likelihood  function  of  the  complete  data  set  P(y|x,  a,  cr2)  is  defined  as  [28], 

N 

P(y|x,  a,  cr2)  =  J~ ^(2no-2)~2e(2^(yi~^)  \  (2.44) 

i=  1 

where  N  is  the  number  of  training  vectors,  n  and  i  are  the  indexes  of  the  training  vectors, 
and 

N 

fi  =  YjanK(xn,x  0.  (2.45) 

72=1 
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Constraining  the  magnitude  of  a,  RVM  uses  a  bias  in  the  form  of  zero-mean  Gaussian  prior 


[28], 

N 

p(a|^)  =  ]“[iV(0,??71).  (2.46) 

i=  1 

A  Gaussian  prior  distribution  is  enforced  over  the  at  values  with  a  mean  of  zero  [23].  The 
variance  $  is  defined  through  the  maximization  of  the  marginal  likelihood,  lending  to  more 
a,  weights  evaluating  to  zero,  and  resulting  in  an  increasingly  sparse  relevance  vector  set 
[23].  From  the  prior  in  Equation  (2.46)  of  a  and  the  likelihood  function  (2.44),  the  posterior 
probability  is  represented  as  [23,  28] 

p(a,  tf,  cr2|y)  =  p(a\y,  cr2)p(ft,  cr2 |y).  (2.47) 

From  the  posterior  probability  in  Equation  (2.47),  the  marginal  likelihood  P( y|$,  cr2)  is 
derived  [23,  28].  The  maximization  of  the  marginal  likelihood  with  respect  to  #  and  cr2 
gives  the  optimal  hyperplane  [23,  28]. 

Implementing  the  additional  Bayesian  constraints  of  RVM  on  the  SVM  classification 
method  produces  a  sparse  set  of  relevance  vectors  [23].  The  sparsity  of  the  relevance  vector 
set,  enforced  by  Equation  (2.46),  limits  the  number  of  basis  functions,  and  smooths  the 
hyperplane.  We  use  RVM  to  evaluate  the  classification  performance  of  different  feature 
vectors  in  Chapters  III  and  IV.  The  work  done  by  Flynn  [7]  used  RVM  to  compare 
classification  performance  of  different  combinations  of  bandwidth,  elevation,  azimuth  and 
aperture. 

2.5  Previous  Work  With  Pixel  Attributes  From  SPLIT 

Previous  work  performed  by  Flynn  used  SPLIT  to  extract  feature  vectors  from  pixels 
of  SAR  images  [7].  Flynn  used  multiple  feature  vectors  extracted  from  each  image  to 
classify  the  vehicle  in  the  image  [7].  The  work  in  this  thesis  is  a  follow-on  effort  to  Flynn’s 
research. 
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2.5.1  Flynn’s  Work  on  Pixel  Attributes  From  SPLIT. 

The  work  performed  by  Flynn  analyzed  the  impact  of  bandwidth,  elevation,  azimuth 
and  aperture  on  classification  performance  [7].  The  features  used  for  classification  were 
formed  directly  from  peak  pixel  attributes,  [a,  ke,  ka,  x,y],  extracted  using  SPLIT.  The 
features  from  a  single  pixel  made  up  the  feature  vector  used  to  classify  the  image.  RVM 
and  the  AFRL  civilian  vehicles  (CV)  Domes  data  set  was  used  to  evaluate  the  performance 
of  different  combinations  of  bandwidth,  elevation,  azimuth,  and  aperture.  The  next  sections 
discuss  CV  Domes  and  Flynn’s  work  [7]  in  more  detail. 

2.5.2  CV  Domes. 

Training  and  testing  data  was  drawn  from  the  CV  Domes  data  set.  The  CV  Domes  data 
set  is  a  collection  of  X-band  scattering  data  for  a  set  of  ten  vehicles  [1],  Fully -polarized 
far-held  monostatic  scatter  data  is  simulated  over  360  degrees  of  azimuth  and  from  30 
degrees  up  to  60  degrees  of  elevation.  In  azimuth,  data  was  simulated  every  0.0625  degrees, 
resulting  in  a  total  of  5, 760  azimuth  samples  for  each  elevation.  In  elevation,  the  data  was 
simulated  every  0.0625  degrees,  resulting  in  a  total  of  480  elevation  samples  per  azimuth 
angle. 

Phase  history  is  generated  for  each  of  the  ten  vehicles  in  Figure  1.1  [1].  For  each 
elevation  and  azimuth  pair,  a  1 -dimensional  profile  is  simulated  with  512  frequency 
samples  with  a  center  frequency  of  9.6  GHz  and  a  bandwidth  of  5.35  GHz  [1],  The  1- 
dimensional  profiles  are  documented  in  the  frequency  domain  as  phase  history.  The  data  is 
fully  polarimetric  with  HH,  HV,  and  VV  linear  polarization  channels  [1]. 

The  CV  domes  data  using  full  bandwidth  with  an  aperture  of  20  degrees  has  a 
cross-range  resolution  of  0.0448  meters  and  a  range  resolution  of  0.0280  meters.  Range 
resolution,  px,  is  a  function  of  the  bandwidth,  B.  and  the  speed  of  light,  c,  where  [8] 
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Cross-range  resolution,  py,  is  a  function  of  the  wavelength,  A,  and  the  aperture,  A <p,  in 


radians  where  [8] 


(2.49) 


2.5.3  Flynn  ’ s  Results. 

Flynn’s  research  using  a  single  observation  state  and  feature  vectors  tied  back 
to  a  single  pixel  classified  SAR  images  [7].  Flynn  concluded  the  research  with  a 
recommendation  on  the  bandwidth,  elevation,  azimuth,  and  aperture  collection  parameter 
for  SAR  images.  He  used  the  CV  Domes  data  set  to  simulate  classification  performance 
with  different  bandwidth,  elevation,  azimuth,  and  aperture  parameters.  From  the 
classification  performance  results,  Flynn  identified  a  parameter  set  with  the  highest 
performance  [7].  Flynn  recommends  an  azimuth  angle  of  90  degrees  to  135  degrees,  an 
aperture  size  of  60  degrees,  an  elevation  angle  of  30  degrees,  and  a  bandwidth  from  640 
MHz  to  3  GHz  based  on  the  performance  of  the  parameters  [7].  Similarly,  we  compare 
feature  vectors  in  Chapters  III  and  IV. 


2.6  Research  Goals  and  Assumptions 

We  want  to  implement  the  pixel  method  from  [7]  using  the  full  extent  of  observation 
angles  and  an  aperture  of  20  degrees.  Considering  the  concept  of  operation  of  a  SAR 
platform  [29],  we  conclude  the  platform  has  limited  control  of  the  observation  angles 
of  a  target.  Because  there  is  limited  control  of  the  elevation  and  azimuth  of  a  target, 
classification  performance  must  be  evaluated  using  the  full  extent  of  observation  angles. 
Additionally,  given  a  bandwidth  of  5.35GHz  and  a  center  frequency  fc  =  9.6GHz,  a  60 
degree  aperture  is  considered  a  wide-angle  synthetic  aperture  [30].  A  wide-angle  synthetic 
aperture  is  any  synthesized  aperture  having  an  angular  extent,  A <p,  greater  than  required  to 
have  equivalent  resolution  in  range  and  cross-range  given  by  [30].  A  wide-angle  synthetic 
aperture  is  defined  as  [30] 
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(2.50) 


Acp  >  2  sin  1 


where  BW  is  the  bandwidth  and  fc  is  the  center  frequency.  Also,  unlike  ideal  point  scatters, 
canonical  scatterers  have  an  angular  persistence  of  less  than  20  degrees  [31].  For  these 
reasons,  we  use  a  20  degree  aperture.  Note  that,  given  a  distinct  SAR  collection  scenario, 
the  aperture  may  be  different,  as  the  aperture  is  dependent  on  the  equipment  and  the  concept 
of  operation. 

From  the  CV  domes  data  set,  there  are  a  total  of  27,705,600  possible  target  states  given 
the  constraints  of  a  20  degree  aperture,  and  a  constant  elevation  angle  across  the  aperture. 
Figure  2.8  illustrates  all  possible  target  states.  Separating  the  CV  Domes  into  sedans  and 
SUVs,  there  are  16,623,360  possible  sedan  states  and  11,082,240  possible  SUV  states.  To 
reduce  the  data  size  and  respect  computational  limitations,  we  sparsely  sample  from  the  full 
extent  of  elevation  and  azimuth.  The  rate  at  which  we  sample  the  target  states  is  defined  in 
Sections  3.1  and  3.3. 


A  Vehicle  Model:  10  instances 
CV  Domes  Vehicles 

Instance  of  a  state  in  the  SUV  Class 
•  [Elevation  31.7500,  Azimuth  50, 
Model  Jeep  Model  Year  1999] 


Instance  of  a  state  in  the  Sedan  Class 
[Elevation  34.2500,  Azimuth  30, 
Model  Camry] 


-► 

Elevation:  481  instances 
30 :  0.0625  :  60  degrees 


Azimuth:  5760  instances 
0  :  0.0625  :  359.9375  degrees 


Figure  2.8:  All  Possible  Target  States  From  the  CV  Domes  Data  Set. 
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We  want  to  capture  the  entire  image  in  a  single  feature  vector  to  investigate  the 
attributes  of  the  entire  image  for  impact  on  classification  performance.  Evaluating  the 
performance  of  subsets  of  features  in  the  feature  vector  enables  us  to  relate  classification 
performance  to  individual  features.  If  the  features,  and  the  attributes  they  are  composed  of, 
are  tied  back  to  physical  elements  of  the  vehicles,  then  we  can  tie  physical  elements  of  the 
vehicles  to  impact  in  classification  performance. 

In  Chapter  III,  we  construct  a  process  to  form  feature  vectors  from  an  entire  SAR 
image.  First,  we  extend  the  research  by  Flynn  [7]  to  use  more  target  states  and  investigate 
the  corresponding  classification  performance  of  subsets  of  the  overall  feature  vector.  With 
a  baseline  of  the  performance  from  Flynn’s  feature  extraction  method  [7],  we  extend  the 
method  to  form  features  across  the  entire  image  space.  Utilizing  features  from  the  entire 
image  space,  we  investigate  the  corresponding  classification  performance  of  subsets  of  all 
the  features  we  extract  from  the  image. 
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III.  Application  of  Feature  Extraction  Methods 


The  goal  of  this  chapter  is  to  apply  various  methods  for  feature  extraction  and  to 
evaluate  their  corresponding  classification  performance.  Classification  performance  is 
evaluated  using  the  linear  and  non-linear  classifiers,  LDA  and  RVM  respectively.  In 
Sections  3.1  and  3.2,  the  classification  performance  of  feature  vectors  formed  using  the 
pixel  method  is  evaluated.  In  Sections  3. 3-3. 5,  the  classification  performance  of  feature 
vectors  formed  using  the  cell  method  is  evaluated.  Section  3.6  reports  the  notable 
conclusions  from  the  pixel  method  and  the  cell  method.  Section  3.7  introduces  the  concept 
of  feature  saliency. 

3.1  Pixel  Method  for  Extracting  Features 

The  pixel  method  for  extracting  features  is  similar  to  the  method  used  by  Flynn  [7]. 
There  are  two  major  differences  between  the  implementation  of  the  pixel  method  employed 
in  this  work  and  previous  work  [7].  First,  we  train  and  test  using  target  states  spanning 
180  degrees  in  azimuth  and  up  to  nine  degrees  in  elevation.  The  second  difference  is  the 
implementation  of  a  segmentation  process  to  reduce  the  number  of  feature  vectors  extracted 
from  each  target  state’s  image. 

3.1.1  Pixel  Method. 

We  use  the  process  shown  in  Figure  3.1  to  implement  the  pixel  method.  Attributes  are 
extracted  from  the  data  using  SPLIT.  Features  are  formed  from  the  attributes  of  individual 
pixels.  The  extracted  features  then  are  used  to  populate  a  feature  vector  to  be  analyzed. 
The  classification  performance  of  various  feature  vectors  is  then  simulated  and  evaluated. 
The  image  segmentation  and  formation  of  features  shown  in  the  third  block  of  Figure  3.1, 
is  unique  to  the  pixel  method.  We  are  unable  to  process  all  of  the  data  from  the  CV  Domes 
data  set  and  pull  all  possible  feature  vectors  from  each  image  due  to  computational  limits. 
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We  limit  the  number  of  feature  vectors  by  sparsely  sampling  the  target  states  from  the  CV 
Domes  data  set  and  using  image  segmentation. 


data  • 


SPLIT  Extracts  All 
Attributes 


Segment  Image 
and  Form  Features 


Select  Feature 
Vector  to 
Evaluate 
Performance 


Repeat  for 
Monte  Carlo 
Analysis:  40  trials 


|\  Populate  Feature 
Vector 


Classifier 

Random:  80%  Train 
Remaining:  20%  Test 


Pc(l)  y  \1/  y  Pc(40i 


mean  [Pc] 


Classification 
■>  Performance 
Of  Feature  Vector 


Figure  3.1:  Block  Diagram  of  Classification  Performance  Analysis  of  Pixel  Method 
Feature  Vectors. 


Using  the  extracted  feature  vectors  from  all  of  the  pixels  with  a  valid  alpha  value,  as 
was  performed  by  Flynn  in  [7],  is  computationally  expensive  when  implementing  multiple 
observation  states.  To  reduce  the  computational  costs,  we  reduce  the  number  of  feature 
vectors  we  extract  per  image  using  segmentation.  The  pixels  in  the  SAR  images  are 
segmented  using  threshold  values  of  all  the  attributes  of  pixels,  [  x,  y,  A,  a,  ka.  Ay]  based 
on  the  ideal  mapping  shown  in  Figure  3.2.  Unlike  previous  work  [11],  we  implement  a 
threshold  on  distances  from  peak  amplitude  pixels  within  the  image  instead  of  ideal  pixel 
attributes  to  segment  pixels. 
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Figure  3.2:  Three  Dimensional  Attribute  Space  [11]. 


We  utilize  the  process  shown  in  Figure  3.3  to  segment  the  pixels  in  an  image.  The 
pixels  are  sorted  from  highest  to  lowest  by  amplitude.  The  first  pixel  is  defined  to  be 
an  associate  pixel.  The  subsequent  pixels  are  segmented  with  an  associate  pixel  or  are 
assigned  to  be  a  new  associate  pixel  based  on  a  threshold  on  distance  between  pixels  in 
attribute  space. 

Figure  3.2  shows  there  are  three  general  cases  for  the  threshold  on  distance  in  attribute 
space  under  which  two  pixels  can  still  be  classified  as  the  same  type  of  canonical  shape. 
The  first  case  is  the  ideal  mapping  of  a  dihedralo  in  attribute  space.  The  second  case  is  the 
mapping  of  a  cylinder0  in  attribute  space.  The  third  case  is  the  mapping  of  the  remaining 
canonical  shapes.  To  model  the  different  cases,  we  use  three  different  sets  of  thresholds 
depending  on  the  location  of  the  associate  pixel  in  attribute  space. 

•  Case  1:  If  the  associate  pixel  is  aas  <  0.5,  a  ke  as  >  0.5,  and  a  ka  as  <  0.5. 

•  Case  2:  If  the  associate  pixel  is  aas  <  -0.5,  a  ke  as  <  0.5,  and  a  ka  as  <  0.5. 

•  Case  3:  If  the  associate  pixel  is  not  case  1  or  case  2. 
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Figure  3.3:  Block  Diagram  of  the  Image  Segmentation  Process. 


For  associate  pixels  that  exist  in  Case  1,  the  threshold  on  the  distance  between  a  pixel  and 
the  associate  pixel  is 


^ as  ^//l  —  3  Sc  \ke  as  ke  a]  ^  0.5  &  | k0  as  kQ  a\  <  0.5  & 

\xas  ~  Xu\  <  0.75  &  |v„,s  -  yu |  <0.75, 


(3.1) 


where  the  subscript  as  is  the  index  of  associate  pixels  and  the  subscript  ii  is  the  index  of 
the  pixel  being  segmented.  For  associate  pixels  that  exist  in  Case  2,  the  threshold  on  the 
distance  between  a  pixel  and  the  associate  pixel  is 


(Xas  ^0/1  —  3  Sc  \ke  as  ke  ii\  <  0.5  &  \kQ  as  kQ  // 1  <  0.5  & 
\xas  -  xH\  <  0.75  &  |v„,s  -  >’,/!  <0.75. 


(3.2) 
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For  associate  pixels  that  exist  in  Case  3,  the  threshold  on  the  distance  between  a  pixel  and 
the  associate  pixel  is 


&as  au\  —  1  &  I k<’  as  ke  ii\  —  0-5  &  \k0  as  kQ  // 1  ^  0.5  & 
\xas  -  Xu\  <  0.75  &  |v„,s  -  yu\  <0.75. 


(3.3) 


Figure  3.4  shows  an  example  of  pixel  segmentation  based  on  the  outlined  process.  After 


Figure  3.4:  Segmented  Pixels  With  the  Ideal  Mapping  of  Extracted  Attributes  to  Canonical 
Shapes  of  Toyota  Camry  at  30  Degrees  Elevation,  and  50  Degrees  Azimuth. 


the  segmentation  process  feature  vectors  are  formed  from  the  attributes  of  the  associate 
pixels.  The  segmentation  process  shown  in  Figure  3.3  reduces  the  number  of  feature  vectors 
extracted  for  an  image  by  more  than  an  order  of  magnitude. 

The  process  of  segmentation  implicitly  captures  the  pixel  amplitude  in  the  location 
features,  x  and  y.  The  pixel  amplitude  is  also  explicitly  captured  in  the  amplitude  feature, 
A.  However,  we  define  segments  by  the  pixel  with  the  greatest  amplitude,  and  the 
location  features  also  contain  pixel  amplitude  information.  With  the  feature  vector  from 
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segmentation,  we  evaluate  the  classification  performance  of  the  feature  vectors  formed 
from  different  sets  of  features  itemized  in  Table  3.1. 


Table  3.1:  Pixel  Based  Feature  Vectors. 


Reference  # 

Feature  Vector 

1 

[  V  >’] 

2 

[  v  y.  A] 

3 

[  V  >’,  K\ 

4 

[  x,  y,  ke\ 

5 

[  .v,  y,  k0,  ke] 

6 

[  .v,  >’,  A,  k0,  ke] 

7 

[  x,  y,  a] 

8 

[  x,  y.  A,  a] 

9 

[  x,  y,  k0,  a] 

10 

[  x,  y,  ke,  a] 

11 

[  x,  y,  k0,  ke,  a] 

12 

[  .v,  y.  A,  k0 ,  ke,  a] 

If  we  evaluate  with  all  possible  target  states  from  the  CV  Domes  data  set,  we  are 
unable  to  process  all  the  data  with  the  computational  resources  available.  Respecting 
computational  limitations,  we  use  the  states  shown  in  Figure  3.5.  We  increase  the  span 
of  elevation  states  from  30  to  32  degrees  up  to  30  to  39  degrees  and  report  the  impact 
of  increasing  the  diversity  in  elevation  to  classification  performance.  Computational 
limitations  prohibit  examining  a  greater  diversity  in  elevation. 
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A  Vehicle  Model:  10  instances 
CV  Domes  Vehicles 


Instance  of  a  state  in  the  SUV  Class 


[Elevation  31.7500,  Azimuth  50, 
Model  Jeep  Model  Year  1999] 


Instance  of  a  state  in  the  Sedan  Class 

[Elevation  34.2500,  Azimuth  30, 
Model  Cam ry] 


-► 

Elevation:  37  instances 
30 :  0.2500  :  39  degrees 


Azimuth:  9  instances 


10:20:  170  degrees 


Figure  3.5:  Target  States  Used  to  Evaluate  Classification  Performance  of  Pixel  Method. 

3.2  Pixel  Method  Results 

We  evaluated  the  feature  vectors  in  Table  3.1  using  the  process  shown  in  Figure 
3.1.  The  metric  used  to  evaluate  classification  performance  is  the  mean  classification 
performance  for  the  feature  vectors  based  on  30  trials  using  randomly  chosen  training  and 
testing  data.  We  compare  the  averaged  corresponding  performance  of  each  feature  vector 
to  evaluate  the  pixel  method  for  classifying  targets. 

Feature  vectors  from  Table  3.1  containing  the  amplitude  feature  consistently  have  the 
performance  of  a  1R  classifier  [32],  where  all  test  feature  vectors  are  classified  to  be  the 
class  with  the  greatest  number  of  training  feature  vectors.  The  classification  results  for  all 
feature  vectors  with  an  amplitude  feature  have  100  percent  correct  classification  for  sedans 
and  zero  percent  correct  classification  for  SUVs.  The  classification  performance  of  the 
remaining  feature  vectors  is  reported  in  Figure  3.6. 

The  use  of  polarization  response  attributes  is  associated  with  the  highest  classification 
performance.  The  highest  performing  feature  vectors  are  feature  vectors  three  and  five  from 
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Figure  3.6:  Classification  Performance  Versus  Elevation  Sampling  Diversity.  Elevation 
sampling  diversity  is  from  30  to  6  degrees  elevation.  The  error  bars  are  the  standard 
deviation  of  the  classification  performance. 


Table  3.1,  both  of  which  include  the  odd  bounce  polarization  attribute,  k„.  Feature  vector 
four  is  the  next  best  performing  feature  vector  and  includes  the  even  bounce  polarization 
attribute,  ke. 

The  use  of  frequency  response  attributes  reduces  classification  performance.  For 
example,  the  classification  performance  of  feature  vector  one  is  reduced  with  the  addition 
of  the  frequency  response  attribute,  a,  in  feature  vector  seven.  The  highest  performing 
feature  vector  is  reduced  by  more  than  its  standard  deviation  with  the  addition  of  a  in 
feature  vector  11. 

The  classification  performance  using  the  pixel  method  for  feature  extraction  peaks  at 
66  percent  correct  classification,  although  the  results  from  other  methods  show  that  higher 
classification  performances  are  attainable  [3-6].  Due  to  the  poor  classification  performance 
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of  the  pixel  method  under  the  given  test  conditions,  we  want  to  explore  another  option  for 
feature  extraction. 

3.2.1  Motivation  for  Cell  Method. 

It  is  desirable  to  improve  classification  performance,  and  to  identify  attributes  and 
features  which  have  the  greatest  impact  on  the  classification  performance.  Backward 
selection  is  a  common  method  for  analyzing  the  impact  of  features  on  classification, 
and  will  be  discussed  in  Section  4.1.  However,  the  structure  of  a  pixel  method  feature 
vector  is  not  appropriate  for  backward  selection.  A  single  feature  vector  that  captures  the 
information  within  an  entire  SAR  image  is  required  for  backward  selection.  One  method  of 
constructing  a  single  feature  vector  from  an  image  is  to  use  cells  as  a  framework  to  extract 
features. 

3.3  Cell  Method  for  Extracting  Features 

An  image  is  composed  of  a  matrix  of  pixels  which  may  be  divided  into  spatial  regions 
defined  as  cells  and  blocks.  Cells  do  not  overlap,  and  four  cells  compose  a  spatial  region 
defined  as  a  block.  The  relationship  of  a  pixel  to  a  cell  to  a  block  is  captured  in  Figure 
3.7.  The  use  of  cells  as  a  framework  to  extract  features  is  presented  in  Dalai  and  Triggs’ 
histograms  of  oriented  gradients  (HOG)  work  [33].  Features  are  extracted  in  two  ways 
from  the  cells.  First,  the  mean  and  mode  of  attributes  within  the  cells  are  used  to  form 
features.  Second,  HOG  is  used  to  form  features. 

Six  feature  types  are  formed  using  the  mean  and  mode  of  an  attribute  within  a  cell. 
The  “a  mean”  feature  type  is  the  mean  of  the  a  attribute  within  a  cell.  The  “a  mode” 
feature  type  is  the  mode  of  the  a  attributes  within  a  cell.  The  “ke  mean”  feature  type  is  the 
mean  of  the  ke  attribute  within  a  cell.  The  “ke  mode”  feature  type  is  the  mode  of  the  ke 
attribute  within  a  cell.  The  “k0  mean”  feature  type  is  the  mean  of  the  k„  attribute  within  a 
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Figure  3.7:  The  Pixel  to  Cell  to  Block  Relationship. 


cell.  The  “ka  mode”  feature  type  is  the  mode  of  the  k0  attribute  within  a  cell.  The  second 
method  we  use  to  form  features  is  HOG  on  amplitude  and  polarization  images. 

Histogram  of  oriented  gradients  (HOG)  is  a  tool  originating  from  computer  vision  and 
image  processing  which  defines  the  orientation  of  contours  within  an  image  space  using 
gradient  computations  based  on  cells  of  an  image  [33].  HOG  calculates  the  gradient  vector, 
also  known  as  an  image  gradient,  for  each  pixel,  pc  within  each  cell  of  an  image  [33].  The 
gradient  vector  of  pixel  pc  is 


[A Px,  A Py\  =  [Pr  -  Pi,  Pu  -  Pd\,  (3.4) 

where  the  notation  is  shown  in  Figure  3.8.  The  magnitude  of  the  gradient  vector  of  a 
pixel,  pmag,  is  pmag  =  yj^pl  +  A p2y.  The  angle  of  the  gradient  vector  of  a  pixel,  pang,  is 
Pans  =  tan_1(j^j)- 

A  histogram  of  gradients  is  constructed  for  each  cell  from  the  gradient  vectors  of  all 
of  the  pixels  in  each  cell.  The  bins  of  the  histogram  are  the  gradient  vector  angles.  The 
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Figure  3.8:  Example  for  Pixel  Gradient  Vectors. 


magnitude  in  each  bin  is  the  sum  of  the  gradient  vector  magnitudes  of  each  pixel  in  the  bin. 
From  the  image  in  Figure  3.9,  HOG  forms  the  histogram  in  Figure  3.10  for  a  single  cell. 


X  (meters) 

Figure  3.9:  Non-Coherently  Formed  360  Degree  Image  of  a  Toyota  Camry,  Formed  with 
20  Degree  Apertures. 
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0.18 


Gradient  Vector  Bins 


Figure  3.10:  Histogram  of  gradients  of  a  cell  of  Figure  3.9.  Cell  spans  x  =  [-3  :  -2]  and 
y  =  [1.333  :  2], 


HOG  groups  cells  into  larger  spatial  regions  defined  as  blocks  [33].  The  blocks 
overlap,  covering  the  entire  image.  Within  each  block,  the  histograms  of  the  all  the 
cells  are  normalized,  and  the  normalized  copies  of  the  histograms  are  defined  as  features. 
Because  of  the  overlapping  blocks,  multiple  normalized  histograms  are  defined  as  features 
for  each  cell.  If  HOG  is  implemented  without  normalization,  then  the  original  histogram 
bin  magnitudes  from  each  cell  are  also  recorded  as  features.  We  implement  HOG  using 
MATLABs  “computer  vision”  toolbox  [34]. 

Using  HOG,  we  derive  three  feature  types;  HOG  of  amplitude,  HOG  of  ke,  and  HOG 
of  ka.  The  name  of  each  types  comes  from  the  attribute  used  to  develop  the  image  on 
which  HOG  operates.  Shown  in  Figure  3.11  is  an  image  developed  from  the  amplitude 
attributes  of  pixels  extracted  using  SPLIT.  Figure  3.12  shows  an  image  developed  from  the 
ke  attributes  of  pixels  extracted  using  SPLIT.  Figure  3.13  shows  an  image  developed  from 
the  k0  attributes  of  pixels  extracted  using  SPLIT. 
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X  (meters) 


Figure  3.11:  Image  From  Amplitude  Attribute  of  Pixels.  Source  is  a  20  degree  aperture  of 
a  Toyota  Camry  at  30  degrees  elevation  and  10  degrees  azimuth. 


-3-2-10123 
X  (meters) 


Figure  3.12:  Image  from  ke  Attribute  of  Pixels.  Source  is  a  20  degree  aperture  of  a  Toyota 
Camry  at  30  degrees  elevation  and  10  degrees  azimuth. 
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X  (meters) 


Figure  3.13:  Image  from  k0  Attribute  of  Pixels.  Source  is  a  20  degree  aperture  of  a  Toyota 
Camry  at  30  degrees  elevation  and  10  degrees  azimuth. 


3.4  Cell  Method  Evaluation 

We  use  the  process  shown  in  Figure  3.14  to  evaluate  the  classification  performance 
of  the  cell  method.  Attributes  are  extracted  from  the  data  using  SPLIT.  Features  are  then 
formed  from  the  attributes  of  the  cells.  The  resulting  features  are  used  to  populate  feature 
vectors  for  evaluation.  The  classification  performance  of  the  resulting  feature  vectors  may 
then  be  evaluated. 

We  evaluate  the  cell  method  using  a  6x6  grid  of  cells.  A  comparison  of  a  5x5  grid, 
a  6x6  grid,  a  7x7  grid,  and  an  8x8  grid,  shown  in  Figure  3.15,  gives  no  clear  indication 
of  a  superior  grid  size.  The  6x6  grid  of  cells  gives  a  similar  symmetric  grid  of  cells  and  a 
smaller  overall  feature  vector  than  an  8x8  grid  of  cells.  The  6x6  grid  of  cells  is  laid  out  and 
labeled  as  shown  in  Figure  3.16.  The  labels  are  used  for  reference  in  Section  4.3.  The  SAR 
image,  on  which  the  6x6  grid  is  overlaid,  is  a  single  target  state  of  a  Toyota  Camry  at  10 
degrees  azimuth  and  30  degrees  elevation. 
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Figure  3.14:  Block  Diagram  of  Classification  Performance  Analysis  of  Cell  Method 
Feature  Vectors. 
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Figure  3.15:  Comparison  of  Cell  Size  Classification  Performance  with  the  RVM  Classifier. 
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Y  (meters) 


Figure  3.16:  6x6  Grid  of  Cells  Laid  Over  a  SAR  Image. 
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If  we  evaluate  the  cell  method  with  all  possible  target  states  from  the  CV  Domes  data 
set,  then  we  are  unable  to  process  all  the  data  with  the  computational  resources  available. 
To  reduce  the  computational  resources  required,  the  CV  Domes  data  set  is  sparsely  sampled 
every  0.25  degrees  in  elevation,  20  degrees  in  azimuth,  and  using  every  vehicle.  The 
result  is  non-overlapping  apertures  covering  360  degrees  on  the  vehicles  and  30  degrees 
in  elevation.  A  total  of  21,780  target  states  are  used.  Respecting  the  computational  limit, 
we  use  the  states  shown  in  Figure  3.17. 


i  k  Vehicle  Model:  10  instances 
CV  Domes  Vehicles 

Instance  ofa  state  in  the  SUV  Class 

•  [Elevation  31.7500,  Azimuth  50, 

Model  Jeep  Model  Year  1999] 

Instance  ofa  state  in  the  Sedan  Class 

[Elevation  34.2500,  Azimuth  30, 

Model  Camry] 

- ► 

Elevation:  121  instances 
30  :  0.2500  :  60  degrees 

Azimuth:  18  instances 
10:20:  350  degrees 

Figure  3.17:  Target  States  Used  to  Evaluate  Classification  Performance  of  Cell  Method. 


The  feature  vectors  itemized  in  Table  3.2  are  21  different  combinations  of  feature  types 
we  evaluate  for  classification  performance.  The  vector  length  column  in  Figure  3.2  with 
two  entries  correspond  to  the  feature  vector  length  with  HOG  normalization,  and  without 
HOG  normalization  respectively.  Comparison  of  the  classification  performance  of  each 
feature  vector  shows  the  comparative  impact  of  the  feature  types.  Section  3.5  reports  the 
classification  performance  of  each  feature  vector. 
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Table  3.2:  Feature  Vectors  for  Classification. 


3.5  Cell  Method  Results 


Using  the  cell  method  described  in  Section  3.3,  the  classification  performance  of 
the  feature  vectors  is  evaluated  and  then  itemized  in  Tables  3. 3-3. 6.  The  classification 
performance  using  the  cell  method  is  better  than  that  of  the  pixel  method.  Similar  to 
the  pixel  method  results,  feature  vectors  including  polarization  attributes  outperform  those 
which  included  only  amplitude  attributes,  as  well  as  those  including  frequency  attributes. 
In  some  cases,  the  results  also  show  an  enhanced  classification  performance  when  a  set  of 
feature  types  are  used  to  construct  a  feature  vector.  The  classification  performance  of  the 
LDA  classifier  proved  to  be  lower  than  the  classification  performance  of  the  RVM  classifier 
under  the  test  conditions. 

3.5.1  Cell  Method  Results  Using  Linear  Discriminant  Analysis. 

The  LDA  classification  performance  using  the  features  from  the  cell  method  without 
normalized  HOG  is  reported  in  Table  3.3  and  varies  between  56.75  percent  correct 
classification  and  75.91  percent  correct  classification.  The  use  of  frequency  response 
features  is  correlated  with  the  lowest  classification  performances,  which  is  consistent  with 
the  pixel  method  results.  Feature  vector  one,  with  only  the  a  mean  type  of  features, 
shows  the  lowest  classification  performance  of  all  the  feature  vectors.  The  second  worst 
classification  performance  is  demonstrated  by  feature  vector  four,  which  has  only  the  a 
mode  feature.  Out  of  all  the  feature  types,  the  a  mean  and  a  mode  features  have  the 
smallest  positive  impact  on  classification  performance. 

The  use  of  the  odd  bounce  polarization  response  corresponds  to  the  highest 
classification  performance  for  features  not  derived  using  HOG.  Feature  vector  six,  with 
only  the  ka  mode  type  of  features,  shows  the  highest  classification  performance  out  of 
feature  vectors  one  through  nine,  which  have  only  one  feature  type.  Feature  vector  three, 
with  only  the  k0  mean  type  of  features,  has  the  second  highest  classification  performance. 
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Table  3.3:  Feature  Vectors  Classification  Results  Using  LDA  Without  HOG  Normalization. 


Using  normalization,  Dalai  and  Triggs’  improved  the  classification  performance  of 
HOG  features  for  human  detection  by  four  percent  [33],  and  similar  performance  gains  are 
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observed  in  the  comparison  of  Table  3.4  and  Table  3.3  for  the  HOG  features.  The  use  of 
HOG  normalization  on  the  HOG  feature  types  corresponds  to  an  improved  classification 
performance  of  seven  to  ten  percent  correct  classification.  The  additional  dimensions 
in  the  feature  vector,  due  to  the  normalization  process,  improve  the  linear  classification 
performance  of  feature  vectors  with  HOG  feature  types. 


Table  3.4:  Feature  Vectors  Classification  Results  Using  LDA  With  HOG  Normalization. 


Feature 

Vector 

a 

mean 

a 

mode 

ke 

mean 

ke 

mode 

kQ 

mean 

Ko 

mode 

amp 

HOG 

ke 

HOG 

kQ 

HOG 

Performance 

%  Correct 

7 

V 

85.94 

8 

V 

79.99 

9 

V 

83.67 

14 

V 

V 

76.66 

15 

V 

V 

74.09 

16 

V 

V 

V 

69.32 

17 

V 

V 

V 

67.31 

18 

V 

V 

V 

V 

70.33 

19 

V 

V 

V 

V 

66.64 

20 

V 

V 

V 

V 

V 

V 

V 

66.71 

21 

V 

V 

V 

V 

V 

V 

V 

V 

V 

65.73 

3.5.2  Cell  Method  Results  Using  Relevance  Vector  Machine. 

The  RVM  classification  performance  varies  between  72.65  percent  correct  classifi¬ 
cation  and  95.90  percent  correct  classification  using  the  features  from  the  cell  method, 
without  normalized  HOG,  as  reported  in  Table  3.5.  The  use  of  frequency  response  fea¬ 
tures  correlates  to  the  lowest  classification  performances,  which  is  consistent  with  the  pixel 
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method  results.  Feature  vector  one,  with  only  the  a  mean  feature  type,  shows  the  lowest 
corresponding  classification  performance  of  all  the  feature  vectors.  The  second  worst  clas¬ 
sification  performance  corresponds  to  feature  vector  three,  with  only  the  a  mode  feature 
type.  The  inclusion  of  a  features  to  the  highest  performing  feature  vector,  feature  vector 
17,  correlates  to  a  reduction  in  classification  performance  of  1.58  percent  in  feature  vector 
19.  A  reduction  in  the  classification  performance  is  reported  in  every  instance  where  an  a 
feature  type  is  added  to  a  feature  vector  in  Table  3.5.  Out  of  all  the  feature  types,  the  a 
mean  and  a  mode  features  have  the  least  positive  impact  on  classification  performance. 

The  use  of  the  odd  bounce  polarization  response  corresponds  to  the  highest 
classification  performance.  Of  all  the  feature  vectors  with  a  single  feature  type,  feature 
vector  three,  with  only  k0  mean  feature  type,  shows  the  highest  corresponding  classification 
performance.  The  inclusion  of  ka  feature  types  always  improves  the  classification 
performance  of  the  feature  vector. 

The  use  of  multiple  feature  types  corresponds  to  the  highest  classification  perfor¬ 
mances.  Feature  vector  17,  with  ke  mode,  ka  mode,  and  HOG  feature  types,  shows  the 
highest  overall  corresponding  classification  performance.  Feature  vector  16,  with  ke  mean, 
k0  mean,  and  HOG  feature  types,  shows  the  second  highest  overall  corresponding  classifi¬ 
cation  performance. 

The  improvement  of  classification  performance  with  HOG  normalization  is  reported 
in  Table  3.6.  The  use  of  HOG  normalization  on  the  HOG  feature  types  corresponds  to 
a  minimal  improvement  in  classification  performance.  Unlike  the  improvements  reported 
in  Table  3.4  using  LDA,  the  use  of  normalization  with  HOG  reported  in  Table  3.6  has  at 
best  an  improvement  of  4  percent  correct  classification.  Some  of  the  feature  vectors  had  a 
decrease  in  classification  performance  with  the  use  of  the  normalization  process  of  HOG 
(e.g.  feature  vectors  8,  9,  16,  and  17). 
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Table  3.5:  Feature  Vector  Classification  Results  Using  RVM  Without  HOG  Normalization. 


Table  3.6:  Feature  Vector  Classification  Results  Using  RVM  With  HOG  Normalization. 


3.6  Conclusions 


We  report  a  similar  trend  in  classification  performance  for  both  LDA  and  RVM.  The 
use  of  frequency  response  features  uniformly  reduced  classification  performance.  The 
use  of  polarization  response  features  and  the  use  of  HOG  features  correlates  to  the  best 
performance  of  the  feature  vectors  evaluated.  Notably,  the  combination  of  feature  types  in 
feature  vector  17  corresponds  to  the  best  performance  of  all  the  feature  vectors  evaluated. 

3.7  Saliency  of  Features 

A  feature  vector  formed  from  a  subset  of  all  features  results  in  the  highest  classification 
performance.  We  ask  three  question.  Is  there  a  subset  of  features  with  optimal  separation 
between  classes?  If  there  is  a  optimal  subset  of  features,  then  how  do  we  identify  the 
optimal  subset  of  features?  If  we  identify  an  optimal  subset,  then  what  can  we  learn  from 
it? 

The  subset  of  features  whose  corresponding  classification  performance  is  optimal, 
compared  to  all  the  permutations  of  the  overall  features  set,  may  be  taken  as  the  salient 
set  of  features.  Saliency  of  feature  types  is  measured  by  the  change  in  classification 
performance  from  the  removal  of  the  features  from  the  feature  type.  Chapter  IV  evaluates 
the  saliency  of  the  cell  method  features  using  the  process  of  backward  selection. 
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IV.  Application  of  Backward  Selection  on  Cell  Method  Features 


The  goal  of  Chapter  IV  is  to  evaluate  the  saliency  of  features  using  cell  method  features 
from  Section  3.4  and  the  CV  Domes  data  set.  Section  4.1  introduces  backward  selection 
for  feature  subset  selection.  In  Section  4.2,  backward  selection  is  applied  to  the  cell  method 
features.  Section  4.3  reports  the  results  of  backward  selection  on  the  cell  method  features. 

4.1  Backward  Selection 

Backward  selection  is  a  process  for  subset  selection.  Subset  selection  is  a  process  of 
finding  the  smallest  number  of  dimensions  of  a  feature  set  that  contribute  the  most  to  the 
accuracy  of  the  classifier  [22].  Backward  selection  starts  with  all  available  features  and 
removes  them  one  by  one.  Within  each  iteration,  the  candidate  feature  whose  removal 
decreases  the  classification  error  the  most  is  left  out  of  the  feature  subset  on  the  next 
iteration  of  backward  selection  [22].  Backward  selection  iterates  and  is  complete  when 
the  removal  of  features  no  longer  reduces  the  error  in  classification. 

Backward  selection  is  computationally  expensive,  but  systematically  converges  on  a 
salient  set  of  features.  For  each  feature  removed,  the  classification  performance  must  be 
evaluated  for  each  of  the  remaining  features  [22].  To  reduce  the  dimension  of  the  set 
of  features  from  ( N )  to  ( N  -  r)  features,  classification  performance  must  be  evaluated 
N  +  (N  -  1)  +  (N  -  2)  +  •  •  •  +  (N  -  r  +  2)  +  (N  -  r  +  1)  times  [22].  Backward  selection 
is  more  efficient  than  completing  a  grid  search  of  all  permutations  of  the  feature  set,  which 
would  require  {NN'r)W  evaluations. 

4.2  Implementation  of  Backward  Selection 

4.2.1  Rules. 

To  apply  backward  selection  to  the  cell  method  feature  vectors,  the  following  rules  are 
adopted: 
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•  Treat  histogram  bins  from  HOG  as  a  single  “feature.”  The  HOG  feature  types  form 
nine  features  to  a  cell  from  histogram  bins.  Treating  the  nine  features  from  HOG 
histograms  as  a  single  “feature”  permits  a  single  feature  type  to  be  removed  from  a 
cell  on  each  iteration  of  backward  selection. 

•  The  percent  correct  classification  performance  metric  from  the  analysis  of  the  cell 
method  introduced  in  Section  3.4  is  used  to  evaluate  the  impact  of  removing  a  feature. 
The  feature  whose  removal  results  in  the  highest  percent  correct  classification  is 
permanently  removed  from  the  feature  set. 

•  Use  the  RVM  classifier  to  evaluate  the  change  in  classification  performance  for 
the  removal  of  a  feature  because  the  RVM  classifier  has  a  superior  classification 
performance  over  LDA,  as  reported  in  Chapter  III. 

4.2.2  Method. 

The  same  process  as  the  analysis  of  the  cell  method  introduced  in  Section  3.4  is  used 
to  evaluate  the  classification  performance  associated  with  removing  different  features.  All 
the  permutations  of  the  feature  set,  denoted  as  FS„,  with  a  single  feature  removed  are  the 
feature  vectors,  denoted  as  FVv,  v  6  [1..JV  -  ii  +  1].  The  classifier  is  trained  and  tested 
T  times  for  each  of  the  feature  vectors,  FVv.  The  mean  of  the  classification  performance 
across  all  T  trials  for  a  feature  vector,  FVv,  is  the  classification  performance,  Pcv,  of  the 
feature  vector. 

A  wrapper  is  added  to  the  cell  method  introduced  in  Section  3.4  to  systematically 
remove  features  one  at  a  time.  To  systematically  reduce  the  dimension  of  the  feature  set, 
FSj;,  the  wrapper  permanently  removes  one  feature  from  the  feature  set  on  each  iteration 
and  redefines  all  permutations,  FVv,  of  the  feature  set,  FS!i+i.  The  process  shown  in  Figure 
4. 1  summarizes  the  process  used  to  execute  backward  selection. 
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Figure  4.1:  Block  Diagram  of  the  Backward  Selection  Process.  Backward  selection 
uses  the  same  process  as  cell  method  to  evaluate  the  classification  performance  of  feature 
vectors. 


The  computational  limits  are  observed  by  using  fewer  states  than  the  analysis  of  the 
cell  method  implemented  in  Section  3.3.  The  states  are  sampled  in  elevation  every  three 
degrees,  as  opposed  to  the  0.25  degrees  used  in  analysis  of  the  cell  method.  By  reducing 
the  sampling  in  elevation  of  the  states,  the  classification  performance  from  the  cell  method 
decreases  to  the  performance  reported  in  Table  4.1. 

Backward  selection  is  initialized  using  the  set  of  features  in  feature  vector  20  of  Table 
4.1.  The  feature  types  in  feature  vector  20  are  a  mean,  a  mode,  ke  mean,  ke  mode,  k„  mean, 
k()  mode,  and  HOG  of  amplitude.  We  do  not  initialize  with  feature  vector  21  because  the 
inclusion  of  the  HOG  of  ke  and  the  HOG  of  k„  feature  types  increases  the  number  of  features 
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Table  4.1:  Feature  Vector  Classification  Results  Using  RVM  With  HOG  Normalization,  3 
Degree  Elevation  Sampling. 


by  1800  features.  The  additional  1800  features  almost  double  the  number  of  permutations 
of  FS„  to  be  evaluated.  Doubling  the  number  of  permutations,  doubles  the  computation 
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resources  required  for  each  iteration  of  backward  selection.  Also,  because  feature  vector 
17  has  a  higher  corresponding  classification  performance,  we  know  there  is  a  more  salient 
subset  of  features  within  feature  vector  20. 

4.2.3  Metrics. 

The  comparison  of  feature  type  saliency  is  based  on  the  change  in  classification 
performance  from  the  removal  of  the  features  corresponding  to  each  feature  type.  Saliency 
of  a  feature  type  is  expressed  as 


A  CP 


FT 


1  FT 


(4.1) 


FT 


where  A CPFT  is  the  overall  change  in  percent  correct  classification  from  the  removal  of  the 
features  of  the  feature  type,  and  F FT  is  the  number  of  features  removed  of  the  feature  type. 

The  comparison  of  cell  saliency  is  done  in  a  similar  fashion  as  the  saliency  of  feature 
types.  The  change  in  percent  correct  classification  from  the  removal  of  the  features 
corresponding  to  each  cell  is  the  metric  used.  Saliency  of  a  cell  is  expressed  as 

„  A  CPc 


Cc 


(4.2) 


where  A  CPc  is  the  overall  change  in  classification  from  the  removal  of  the  features  from 
the  cell,  and  Cc  is  the  number  of  features  removed  from  the  cell. 


4.3  Results 

Using  the  method  from  Section  4.2,  we  executed  backward  selection  on  the  feature  set 
from  feature  vector  20  reported  in  Table  4.1.  The  process  of  backward  selection  iterated 
204  times  to  select  a  set  of  features  that  is  more  salient  than  the  feature  vectors  reported 
in  Table  4.1.  The  reported  metrics  indicate  the  least  salient  feature  type  is  a  mean,  and 
the  most  salient  feature  type  is  k0  mode.  The  highest  classification  performance  of  each 
iteration  of  backward  selection  is  shown  in  Figure  4.2. 

Application  of  backward  selection  improves  the  classification  performance,  showing 
an  increase  from  77.23  to  84.28  percent  correct  classification.  The  first  iteration  achieved  a 
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Figure  4.2:  Tracking  of  the  Highest  Classification  Performance  of  Each  Iteration  of 
Backward  Selection  Using  RVM  and  30  Trials. 


classification  performance  of  78.64  percent  correct  classification  with  the  removal  of  an  a 
mode  feature  from  cell  16.  The  first  20  iterations  eliminated  ^  of  the  frequency  response 
features,  of  the  even  bounce  polarization  response  features,  ^  of  the  odd  bounce 
polarization  response  features,  and  of  the  HOG  feature  sets  with  an  increase  of  three 
percent  correct  classification.  The  second  20  iterations  eliminated  none  of  the  frequency 
response  features,  ^  of  the  remaining  even  bounce  polarization  response  features,  ^  of 
the  remaining  odd  bounce  polarization  response  features,  and  of  the  remaining  HOG 
features  sets  with  no  significant  gain  in  percent  correct  classification.  The  next  40  iterations 
eliminated  J  of  the  remaining  frequency  response  features,  ||  of  the  remaining  even 
bounce  polarization  features,  ^  of  the  remaining  odd  bounce  polarization  features,  and 
J  of  the  remaining  HOG  feature  sets  with  an  increase  of  five  percent  correct  classification 
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from  the  baseline.  The  next  40  iterations  eliminated  ^  of  the  remaining  frequency  response 
features,  ||  of  the  remaining  even  bounce  polarization  features,  ||  of  the  remaining  odd 
bounce  polarization  features,  and  ||  of  the  remaining  HOG  feature  sets  with  an  increase  to 
82.32  percent  correct  classification.  The  next  60  iterations  eliminated  ||  of  the  remaining 
frequency  response  features,  of  the  remaining  even  bounce  polarization  features,  || 
of  the  remaining  odd  bounce  polarization  features,  and  of  the  remaining  HOG  feature 
sets  with  no  significant  increase  in  percent  correct  classification.  The  next  24  iterations 
eliminated  ||  of  the  remaining  frequency  response  features,  of  the  remaining  even 
bounce  polarization  features,  ^  of  the  remaining  odd  bounce  polarization  features,  and 
^  of  the  remaining  HOG  feature  sets  with  an  increase  to  the  peak  performance  of  84.28 
percent  correct  classification.  From  the  analysis  of  the  removed  features  and  the  change  in 
classification  performance,  the  saliency  metric  of  feature  types  is  shown  in  Figure  4.3. 


Figure  4.3:  Comparison  of  Feature  Type  Saliency. 
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The  least  salient  feature  type  is  a  mean.  The  greatest  improvement  in  classification 
performance  was  made  through  the  removal  of  the  a  mean  features.  Removal  of  any  other 
feature  type  results  in  an  increase  in  classification  performance  of  less  than  0.00032  percent 
correct  classification  on  average.  The  increase  in  classification  performance  associated 
with  the  removal  of  the  a  mean  features  indicates  that  the  feature  had  a  negative  impact  on 
classification  performance.  The  lack  of  saliency  associated  with  a  mean  is  consistent  with 
the  results  from  Chapter  III. 

The  most  salient  feature  type  is  k0  mode.  The  smallest  improvement  in  classification 
performance  was  made  through  the  removal  of  ka  mode  features.  Removal  of  the  kG  mode 
type  of  features  resulted  in  an  increase  of  7.79  x  10~5  percent  correct  classification  on 
average.  The  k0  mode  feature  type’s  saliency  is  consistent  with  the  results  from  Chapter 

m. 


Figure  4.4:  Comparison  of  Cell  Saliency. 
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The  saliency  of  cells  is  reported  using  a  similar  metric  as  the  saliency  of  feature  types. 
The  saliency  of  cells  is  shown  in  Figure  4.4.  The  least  salient  cells  are  1,  2,  5,  6,  7,  13,  18, 
and  31.  Removal  of  features  from  cells  1,  2,  5,  6,  7,  13,  18,  and  31  resulted  in  an  increase  of 
more  than  0.00090  percent  correct  classification  on  average.  The  increase  in  classification 
performance  associated  with  the  removal  of  features  is  an  indication  that  the  features  have  a 
negative  impact  on  classification  performance.  Removal  of  all  other  cells’  features  resulted 
in  less  improvement  to  classification  performance  than  the  removal  of  features  from  cell 
18. 

The  most  salient  cells  are  16,  22,  and  35.  Removal  of  features  from  cells  16,  22,  and  35 
resulted  in  a  decrease  of  more  than  0.00062  percent  correct  classification  on  average.  The 
overall  decrease  in  classification  performance  associated  with  the  removal  of  features  is  an 
indication  that  the  features  have  a  positive  impact  on  classification  performance.  Results 
show  saliency  for  some  cells. 

Due  to  the  symmetry  of  target  vehicles,  the  saliency  of  the  cells  should  also  be 
symmetric.  However,  results  indicate  that  the  cell  saliency  is  not  symmetric.  The  lack  of 
symmetry  of  the  cells  is  an  indication  that  the  removal  of  non-salient  features  is  hindered  by 
the  variance  in  classification  performance.  The  variance  in  classification  performance  for 
the  initial  set  of  features  is  shown  in  Figure  4.5.  Variance  in  classification  performance  at 
30  trials  is  much  greater  than  the  maximum  change  observed  in  classification  performance 
for  each  iteration.  The  maximum  change  in  classification  performance  is  0.0045  percent 
correct  classification.  The  hypothesis  is  that  the  ambiguity,  resulting  from  variance,  causes 
the  wrong  features  to  be  removed  and  leads  to  the  lack  of  symmetry  in  the  saliency  of  the 
cells  shown  in  Figure  4.4. 
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Figure  4.5:  Variance  in  Classification  Performance  of  Feature  Vector  20. 
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V.  Conclusions  and  Future  Work 


5.1  Conclusions 

The  cell  method  for  feature  extraction  extracts  features  from  SAR  images  related  to 
the  structure  of  physical  elements  of  the  target  as  well  as  captures  image  attributes  from  an 
entire  image  in  a  single  feature  vector.  We  implemented  backward  selection  on  the  set  of 
features  formed  with  the  cell  method.  Backward  selection  identified  a  set  of  features  more 
salient  than  the  initial  set  of  features.  The  initial  set  had  a  classification  performance  of 
77.23  percent  correct  classification,  and  the  classification  performance  of  the  set  of  features 
selected  by  backward  selection  has  a  classification  performance  of  84.28  percent  correct 
classification. 

Backward  selection  selected  a  set  of  features  that  is  not  the  most  salient  set  of  features. 
The  set  of  features  selected  is  more  salient  than  the  initial  set  in  feature  vector  20,  but 
the  improvement  in  classification  performance  was  not  monotonically  increasing.  We 
expect  a  monotonic  increase  in  the  classification  performance  with  the  removal  of  non¬ 
salient  features  [22].  Additionally,  contrary  to  our  expectations,  the  saliency  of  cells  was 
not  symmetric.  We  hypothesize  that  a  reduction  in  classification  variance  will  result  in 
the  selection  of  a  more  salient  set  of  features.  Completely  eliminating  the  impact  of  the 
variance  would  result  in  the  selection  of  the  most  salient  set  of  features.  High  variance  in 
the  classification  performance  limits  the  performance  of  backward  selection.  Despite  the 
high  variance,  the  saliency  of  feature  types  is  consistent  with  the  results  from  the  pixel 
method  and  the  cell  method,  supporting  the  use  of  the  saliency  metric. 

Analysis  in  Sections  3.2,  3.5,  and  4.3  shows  the  frequency  response  attribute,  a,  is 
the  least  informative  attribute  for  classifying  SAR  images  from  the  CV  Domes  data  set. 
Reported  in  the  pixel  method  results,  the  inclusion  of  a  in  a  feature  vector  decreased  the 
corresponding  classification  performance  of  the  feature  vector.  Reported  in  the  cell  method 
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results,  the  a  feature  type  has  the  lowest  corresponding  classification  performance  of  all  the 
feature  types.  Finally,  the  saliency  metric  from  the  result  of  backward  selection  indicates 
the  a  feature  type  is  the  least  salient  feature  type. 

Analysis  in  Sections  3.5  and  4.3  shows  that  a  combination  of  polarization  response 
features  and  image  amplitude  features  form  the  most  salient  set  of  features  for  classifying 
SAR  images  from  the  CV  Domes  data  set.  In  the  pixel  method  analysis,  the  inclusion  of 
polarization  and  location  features  corresponded  to  the  highest  classification  performance. 
In  the  cell  method,  the  combination  of  polarization  and  HOG  of  amplitude  feature  types 
corresponded  to  the  highest  classification  performance.  The  salient  nature  of  amplitude 
features  is  expected  as  previous  work  on  SAR  ATR  focuses  on  amplitude  of  images 
[5,  6].  The  salient  nature  of  polarization  features,  supports  the  incorporation  of  polarization 
information  into  SAR  ATR. 

5.2  Future  Work 

There  are  additional  directions  this  research  may  take  to  follow  what  has  been 
performed  here.  The  directions  span  from  a  continuation  of  work,  to  the  application 
of  research.  Future  work  may  look  into  the  variance  in  the  classification  performance, 
verify  the  extraction  of  the  a  attribute,  review  the  SVM  classifier  for  SAR  ATR,  and  apply 
polarization  attributes  to  other  SAR  ATR  algorithms  such  as  those  in  [3,  5,  6]. 

Future  work  should  look  into  ways  to  manage  or  reduce  the  variance.  We  identified 
the  variance  in  the  classification  performance  as  a  driver  for  missing  the  most  salient  set  of 
features  in  the  execution  of  backward  selection.  One  way  to  manage  the  variance  is  to  use  a 
greater  number  of  trials  in  evaluating  the  classification  performance.  Also,  using  a  greater 
number  of  target  states  may  also  reduce  the  variance.  Both  of  these  methods  will  require 
greater  computational  resources  than  used  in  this  thesis. 

The  extraction  of  the  a  attribute  should  be  investigated  for  accuracy.  The  poor 
performance  of  the  a  attribute  may  be  attributed  to  either  a  bad  extraction  of  the  attribute 
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or  a  lack  of  consistency  in  the  curvature  of  the  physical  elements  of  the  vehicles.  For  future 
work  to  use  the  a  attribute,  the  method  for  extracting  the  attribute  should  be  verified. 

Given  the  great  amount  of  complexity  in  the  SAR  ATR  problem,  it  may  be 
advantageous  to  use  the  SVM  classifier  instead  of  the  RVM  classifier.  SVM  uses  a  greater 
number  of  support  vectors  than  RVM.  The  greater  number  of  support  vectors  allows  for  a 
hyperplane  to  mold  to  the  high  complexity  of  the  feature  space. 

The  polarization  attributes,  or  a  variation  of  them,  should  be  applied  to  other  SAR 
ATR  algorithms.  Results  from  the  cell  method,  the  pixel  method,  and  backward  selection 
identified  the  saliency  of  the  polarization  attributes  in  this  thesis.  The  inclusion  of 
polarization  attributes  may  have  a  significant  improvement  in  the  performance  of  the 
algorithms. 
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